Paper Review

https://doi.org/10.1371/journal.pcbi.1002803

https://doi.org/10.1016/j.jmp.2012.02.005

John Tukey (1962)

Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise.

Summary

  • In Bayesian inference, the likelihood function is a key component that connects data to parameter estimates.
  • In simple models, likelihoods are available in analytical form, but sometimes not available for complex models.
  • Approximate Bayesian Computation (ABC) rejection is an algorithm for generating posterior distributions without needing to evaluate a likelihood function.
  • There is a trade-off: choice of distance function, extra computation time.

Bayes’ Theorem

\[ p(\theta | D) = \frac{p(D | \theta) \times p(\theta)}{p(D)}\quad \textrm{posterior} = \frac{\textrm{likelihood} \times \textrm{prior}}{\textrm{marginal likelihood}}\]

Likelihood (Sunnaker et al. 2013):

  • “probability of the observed data under a particular statistical model”

  • “quantifies the support data lend to particular values of parameters”

Likelihood Example

What’s the probability of getting a head for coin (\(p\)) that has been observed to come up heads three times (\(X = 3\)) out of nine tosses (\(n = 9\))?

\[X \sim \mathrm{Binomial}(n, p)\qquad \mathrm{Pr}(X = x; n, p) = \binom{n}{x}p^x(1-p)^{n-x}\]

\[\begin{aligned} \ell(p) &= \mathrm{Pr}(X = 3; n = 9, p = ?)\\ &= \frac{9!}{3!(9-3)!}p^3(1-p)^{9-3}\\ &= 84p^3(1-p)^6 \end{aligned}\]

We know the likelihood function from our knowledge of the underlying data generating process, e.g., binomial.

But…

What if we don’t have the likelihood function?

 

What if we’re not sure how the observed data supports particular values of the parameter we’re trying to discover?

 

What if our likelihood function is hard to write down explicitly or computationally expensive to evaluate?

Challenging Likelihoods

  • simulations of the temperature map of the CMB

  • large-scale structure of galaxy distributions

  • mass and luminosity distributions for stars and galaxies

Q: Can we sample the posterior without evaluating the likelihood?

YES! Maybe

123 of ABC

ABC replaces the calculation of the likelihood function with simulation.

  1. Data
  2. Generative Model (parameters \(\theta\))
  3. Priors, \(p(\theta)\)
  4. Matching criteria, \(\rho(.)\)

ABC Rejection Algorithm

Distance Functions

Is the simulated data close to the observed data?

\[\rho(\hat{D},D ) \le \varepsilon\]

Common distances measures include:

  • Euclidean distance of every point
  • Summary statistics
    • e.g., distance between means

This is a tricky choice which heavily affects computation time.

Example

Model Comparison

Ratio of posterior distributions gives an indication of which model is better supported by the data.

\[\frac{p(M_1|D)}{p(M_2|D)} = \frac{p(D|M_1) p(M_1)}{p(D|M_2) p(M_2)} = B_{1,2}\frac{p(M_1)}{p(M_2)}\]

\(B_{1,2}\) is known as the Bayes Factor.

Some Pitfalls

  1. Bias due to non-zero value for \(\varepsilon\)

    • samples from \(p(\theta|\rho(\hat{D}, D) \le \varepsilon)\) rather than \(p(\theta|D))\).
  2. Many researcher degrees of freedom:

    • choice of generative model, number of simulations, choice of summary statistics, size of acceptance threshold.
  3. Curse of Dimensionality

  • sampling takes longer,

Take home messages

  • A method to circumvent intractable or ill-behaved likelihood functions.

  • Computationally more expensive than standard Bayesian samplers.

  • Choice of the distance function \(\rho()\) and tolerance thresholds \(\varepsilon\) needs careful attention

  • Not yet feasible for high dimensional problems (work is progress!)